Conversation
Summary of ChangesHello @dmga44, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances write performance and consistency in the system by introducing write buffering and chunkserver-side chunk locking. These changes allow chunkservers to process write operations more efficiently, provide faster acknowledgments to clients, and enable the master to maintain a more accurate state of chunk parts being written. The refactoring of write handling logic and updates to communication protocols lay the groundwork for more robust and performant distributed storage operations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance enhancement through chunkserver-side write buffering and instant replies, underpinned by a new chunk locking mechanism. The changes are extensive, touching the master, chunkserver, and protocol layers. The core idea is to allow chunkservers to acknowledge writes to the client before the data is persisted to disk, which is made safe by locking the chunk on the master to prevent conflicting operations. This required substantial modifications to WriteHighLevelOp, ChunkserverEntry, and JobPool. The read path has also been updated to correctly serve data from these in-memory write buffers. A major and very welcome part of this PR is the significant refactoring of chunk_multi_modify in chunks.cc into smaller, more understandable functions, which greatly improves code clarity and maintainability. The changes appear well-thought-out, especially regarding consistency and error handling in this new asynchronous model. I have one suggestion for a minor improvement regarding type safety in job functions.
3471071 to
209e321
Compare
2ad5155 to
3394f92
Compare
|
@gemini-code-assist can you please review this PR? |
There was a problem hiding this comment.
Code Review
This pull request introduces a write buffering mechanism to improve write performance by allowing instant replies to clients for non-standard slices, provided buffering capacity is available. Key changes include refactoring WriteHighLevelOp to manage multiple concurrent write operations per ChunkserverEntry using a list, and adding a tryInstantReply() method. A new MasterJobPool::changeLockJobsCallback function and related masterconn utilities were added to handle master server disconnections by updating lock job callbacks and reporting lost chunks if errors occur. The hddspacemgr now includes mechanisms (hddInsertAlreadyRepliedInputBuffer, hddRemoveAlreadyRepliedInputBuffer, hddUpdateOutputBufferWithAlreadyRepliedInputBuffers) to store and retrieve already replied write data, enabling reads to be patched from these buffers before data is flushed to disk. The io_buffers module was updated with an atomic counter (gAvailableWriteBufferingBlocks) to manage total write buffering capacity, and InputBuffer now tracks repliedBlocks. Configuration for WRITE_BUFFERING_SIZE_MB was added, and several system tests were updated or created to validate the new functionality. Review comments highlighted a potential race condition in inputBuffer->writeInfo_ due to concurrent modification and access, suggested optimizing a range-based for loop in WriteHighLevelOp::trySeal() by using a const reference, recommended adding comments to clarify complex logic in hddUpdateOutputBufferWithAlreadyRepliedInputBuffers, and pointed out duplicated code for WRITE_BUFFERING_SIZE_MB configuration loading.
There was a problem hiding this comment.
Pull request overview
This PR introduces chunkserver-side write buffering (configurable via WRITE_BUFFERING_SIZE_MB) with logic to allow early client replies and to patch subsequent reads with buffered-but-not-yet-flushed data. It also extends the test suite to cover full-disk metadata behavior with write buffering enabled/disabled.
Changes:
- Add write-buffering capacity tracking and config reload/init wiring for
WRITE_BUFFERING_SIZE_MB. - Implement “instant reply” / buffered-write flow in chunkserver write high-level ops and patch reads using already-replied write buffers.
- Add/adjust system tests to validate behavior when disk space is depleted with write buffering on/off.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tools/saunafs.sh | Sets a default WRITE_BUFFERING_SIZE_MB in generated chunkserver configs for tests. |
| tests/test_suites/ShortSystemTests/test_wb_disable_empty_folders_metadata_after_full_disk.sh | New test: full-disk behavior with WB enabled and empty-folder creation disabled. |
| tests/test_suites/ShortSystemTests/test_wb_allow_empty_folders_metadata_after_full_disk.sh | New test: full-disk behavior with WB enabled and empty-folder creation allowed. |
| tests/test_suites/ShortSystemTests/test_disable_empty_folders_metadata_after_full_disk.sh | Updates test to explicitly disable WB. |
| tests/test_suites/ShortSystemTests/test_allow_empty_folders_metadata_after_full_disk.sh | Updates test to explicitly disable WB. |
| tests/test_suites/GaneshaTests/test_nfs_ganesha_disable_empty_folders_metadata_after_full_disk.sh | Updates NFS-Ganesha test to explicitly disable WB. |
| tests/test_suites/GaneshaTests/test_nfs_ganesha_allow_empty_folders_metadata_after_full_disk.sh | Updates NFS-Ganesha test to explicitly disable WB. |
| src/chunkserver/network_worker_thread.h | Adds global/default for write-buffering size config. |
| src/chunkserver/network_worker_thread.cc | Calls per-loop write maintenance (everyLoopUpdateWrite). |
| src/chunkserver/network_main_thread.cc | Loads/reloads WRITE_BUFFERING_SIZE_MB and updates available buffering blocks. |
| src/chunkserver/masterconn.cc | Adjusts lock-job callbacks on master disconnect; reports lost chunks after delete jobs. |
| src/chunkserver/io_buffers.h | Adds buffer patching helpers, write-buffering block accounting APIs, and new InputBuffer accessors. |
| src/chunkserver/io_buffers.cc | Implements write-buffering block accounting + new buffer patching helpers. |
| src/chunkserver/hddspacemgr.h | Exposes lost-chunk reporting + insert/remove already-replied write buffers for read patching. |
| src/chunkserver/hddspacemgr.cc | Stores already-replied write buffers and patches reads from them before returning to client. |
| src/chunkserver/chunkserver_entry.h | Tracks pending write jobs, supports multiple concurrent write HLOs, adds per-loop write maintenance hook. |
| src/chunkserver/chunkserver_entry.cc | Switches to a list of write HLOs, adds lifecycle/close handling, and integrates instant-reply updates. |
| src/chunkserver/chunk_high_level_ops.h | Adds sealing/instant-reply APIs and tracks delayed-close/sealed state. |
| src/chunkserver/chunk_high_level_ops.cc | Implements sealing + instant replies + delayed-close behavior and read-patching integration. |
| src/chunkserver/bgjobs.h | Adds a callback-maker type and API to change lock-job callbacks. |
| src/chunkserver/bgjobs.cc | Implements changeLockJobsCallback to update callbacks for lock jobs on disconnect. |
3394f92 to
b66c491
Compare
|
@gemini-code-assist |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental write buffering feature in the chunkserver to improve write performance, especially for small files. However, it also introduces several critical and high-severity security vulnerabilities in the chunkserver's write handling logic. These include a process crash when receiving out-of-order packets, resource exhaustion through unbounded object creation, an information leak due to insufficient validation of untrusted packet fields, and a logic error in the new write buffering mechanism that can lead to data corruption for unaligned read requests. Furthermore, enabling this experimental feature by default across the entire test suite could introduce instability in existing tests. These issues must be addressed before the changes are merged.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 24 changed files in this pull request and generated 8 comments.
You can also share your feedback on Copilot code review. Take the survey.
tests/test_suites/ShortSystemTests/test_wb_allow_empty_folders_metadata_after_full_disk.sh
Outdated
Show resolved
Hide resolved
tests/test_suites/ShortSystemTests/test_disable_empty_folders_metadata_after_full_disk.sh
Outdated
Show resolved
Hide resolved
tests/test_suites/ShortSystemTests/test_allow_empty_folders_metadata_after_full_disk.sh
Outdated
Show resolved
Hide resolved
tests/test_suites/ShortSystemTests/test_wb_disable_empty_folders_metadata_after_full_disk.sh
Outdated
Show resolved
Hide resolved
.../test_suites/GaneshaTests/test_nfs_ganesha_disable_empty_folders_metadata_after_full_disk.sh
Outdated
Show resolved
Hide resolved
tests/test_suites/GaneshaTests/test_nfs_ganesha_allow_empty_folders_metadata_after_full_disk.sh
Outdated
Show resolved
Hide resolved
b66c491 to
001f505
Compare
|
@gemini-code-assist |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance enhancement with the new write buffering feature in the chunkserver. While the implementation aims for robustness and includes a valuable fix for chunk lock handling, it unfortunately introduces several critical and high-severity security vulnerabilities. Specifically, a critical race condition in the handling of buffered write information can lead to Use-After-Free (UAF) vulnerabilities and crashes, reinforced by existing guidelines on shared resource synchronization. Furthermore, logic flaws in the data patching mechanism for read operations could allow an attacker to trigger assertion failures, resulting in a Denial of Service (DoS) of the chunkserver. Additionally, a minor code refactoring suggestion has been made for better maintainability. Addressing these security vulnerabilities is paramount before merging this feature.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
001f505 to
004f1a3
Compare
|
@gemini-code-assist |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance enhancement with the write buffering feature in the chunkserver. The implementation is comprehensive, touching upon I/O handling, state management, and error recovery. I appreciate the thoroughness, including the fix for chunk lock handling during master disconnections and the refactoring to support multiple write operations, which improves code clarity. The addition of new tests specifically for write buffering is excellent. I've found one issue related to incorrect validation of write offsets, which I've detailed in a specific comment.
Note: Security Review did not run due to the size of the PR.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
You can also share your feedback on Copilot code review. Take the survey.
|
@gemini-code-assist |
|
This pull request introduces a significant performance enhancement to the chunkserver by implementing write buffering and chunkserver-side chunk locking. The primary goal is to improve write performance and consistency, particularly by reducing client-perceived latency for write operations. Problem Addressed: Solution: Write Buffering Key aspects of the write buffering implementation include:
Constraints for Write Buffering:
Solution: Chunkserver-Side Chunk Locking This involves:
Configuration:
Testing: This feature, tracked under LS-315, significantly enhances the chunkserver's ability to handle write operations efficiently and robustly. |
749e769 to
6cdce8d
Compare
The current sequence of operations after the master connection gets killed, either network error or actual master crash/unexpected stop, is flawed the following way: - there is a write error (the status passed in the write end status) on a locked chunk while the master is disconnected - after the master comes back, the chunk is registered as if it were a good one, while it should have broken data. The proposed solution is to change the callback to those lock jobs so that the chunk parts with broken data gets deleted and unregistered in the master if the master is back. Signed-off-by: Dave <dave@leil.io>
5f59125 to
8a406c0
Compare
This commit introduces the write buffering feature to the chunkserver. This feature changes the chunkserver behavior the way that it fast replies the write data requests before they are actually in the drives as it is expected. Before the write data request is replied: - the sequence of packets sent from the client should be the same, i.e WRITE_INIT should be sent first. - the job open in the begining of the write operation must be processed. - all the data to be written from the write data request should have been read from network. - there should be available write buffering blocks, this counter depends on the new option WRITE_BUFFERING_SIZE_MB and is maintained when clearing and destroying InputBuffer instances. - the chunk should be locked: master must have enabled the USE_CHUNKSERVER_SIDE_CHUNK_LOCK option. This constraint is specially important to handle the write errors that may occur on chunks supposedly written successfully. The standard chunks do not apply for these fast replies. Considering the fast replies, some WriteHighLevelOp might receive WRITE_END packets before all the data it should write is actually written in disk. In such cases, those instances become sealed: - no more network IO related to that write high level operation. - it will wait until the buffered data is actually synced before disappearing. The writeFinishedCallback handles the subsequent writes of the already pending input buffers, see also continueWritingIfPossible. This explains why now the ChunkserverEntry has a list of write high level operations instead of a single one, all the writeHLO instances except the last one must be sealed. Some write operations in the client suppose a previous block is written and attempts reading it back to rebuild the parity parts. Those blocks could be not actually written in the disk, so the reads needs to be patched with the data allegedly synced to the drives, i.e the already buffered data. Side changes: - some tests needed to be patched to make them pass, like the ones testing the option CREATE_EMPTY_FOLDERS_WHEN_SPACE_DEPLETED. - created loadReloadableSettings function to avoid code duplication. Signed-off-by: Dave <dave@leil.io>
8a406c0 to
9338ff1
Compare
| for f in test/*; do | ||
| assert_eventually_prints "" "saunafs fileinfo '$f' | grep ':${info[chunkserver0_port]}'" | ||
| assert_eventually_prints 2 "saunafs fileinfo '$f' | grep copy | wc -l" | ||
| # Assert that data is replicated to chunkservers 1, 2 and no chunk is stored on cs 0 |
| ## Caps the amount of data in MB the chunkserver "buffers", i.e fast replies to the | ||
| ## client as if it was synced to the drives but it was only read from network and | ||
| ## stored in memory. Similar to page cache. This feature has the following constraints: | ||
| ## - does not work on standard and replica chunks |
There was a problem hiding this comment.
Consider:
standard (replica) chunks
| USE_RAMDISK=YES \ | ||
| CHUNKSERVER_EXTRA_CONFIG="READ_AHEAD_KB = 1024|MAX_READ_BEHIND_KB = 2048" \ | ||
| CHUNKSERVER_EXTRA_CONFIG="READ_AHEAD_KB = 1024|MAX_READ_BEHIND_KB = 2048 \ | ||
| |WRITE_BUFFERING_SIZE_MB = 0" \ |
There was a problem hiding this comment.
Why was this needed here?
| starving the other. | ||
| Not reloadable. (Default: FIFO) | ||
|
|
||
| *WRITE_BUFFERING_SIZE_MB (EXPERIMENTAL)*:: Caps the amount of data in MB the chunkserver |
| hddAddErrorAndPreserveErrno(chunk); | ||
| safs::log_warn("{}: file:{} - write error", errorMsg, | ||
| chunk->fullMetaFilename().c_str()); | ||
| chunk->fullDataFilename().c_str()); |
There was a problem hiding this comment.
The .c_str() should not be needed anymore with the new logging functions.
This commit introduces the write buffering feature to the chunkserver.
This feature changes the chunkserver behavior the way that it fast
replies the write data requests before they are actually in the drives
as it is expected. Before the write data request is replied:
WRITE_INITshould be sent first.processed.
been read from network.
depends on the new option
WRITE_BUFFERING_SIZE_MBand is maintainedwhen clearing and destroying
InputBufferinstances.USE_CHUNKSERVER_SIDE_CHUNK_LOCKoption. This constraint is speciallyimportant to handle the write errors that may occur on chunks
supposedly written successfully.
The standard chunks do not apply for these fast replies.
Considering the fast replies, some
WriteHighLevelOpmight receiveWRITE_ENDpackets before all the data it should write is actuallywritten in disk. In such cases, those instances become sealed:
disappearing. The
writeFinishedCallbackhandles the subsequentwrites of the already pending input buffers, see also
continueWritingIfPossible.This explains why now the
ChunkserverEntryhas a list of write highlevel operations instead of a single one, all the writeHLO instances
except the last one must be sealed.
Some write operations in the client suppose a previous block is written
and attempts reading it back to rebuild the parity parts. Those blocks
could be not actually written in the disk, so the reads needs to be
patched with the data allegedly synced to the drives, i.e the already
buffered data.
It was also fixed the CS chunk locks when master disconnects because
the current sequence of operations after the master connection gets
killed, either network error or actual master crash/unexpected stop,
is flawed. The issue is the following:
a locked chunk while the master is disconnected
good one, while it should have broken data.
The proposed solution is to change the callback to those lock jobs so
that the chunk parts with broken data gets deleted and unregistered in
the master if the master is back.
Side changes:
testing the option
CREATE_EMPTY_FOLDERS_WHEN_SPACE_DEPLETED.loadReloadableSettingsfunction to avoid code duplication.Related to LS-315.
To the reviewers, some conversations with Gemini and also its summary could prove useful while reviewing.
Signed-off-by: Dave dave@leil.io